Probabilistic Named Entity Verification

نویسندگان

  • Yi-Chung Lin
  • Peng-Hsiang Hung
چکیده

Named entity (NE) recognition is an important task for many natural language applications, such as Internet search engines, document indexing, information extraction and machine translation. Moreover, in oriental languages (such as Chinese, Japanese and Korean), NE recognition is even more important because it significantly affects the performance of word segmentation, the most fundamental task for understanding the texts in oriental languages. In this paper, a probabilistic verification model is designed for verifying the correctness of a named entity candidate. This model assesses the confidence level of a candidate not only according to the candidate’s structure but also according to its context. In our design, the clues for confidence measurement are collected from both positive and negative examples in the training data in a statistical manner. Experimental results show that the proposed method significantly improves the F-measure of Chinese personal name recognition from 86.5% to 94.4%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing

Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...

متن کامل

Named Entity Recognition Using a Character-based Probabilistic Approach

We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps word...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Named Entity Learning and Verification: Expectation Maximization in Large Corpora

The regularity of named entities is used to learn names and to extract named entities. Having only a few name elements and a set of patterns the a lgorithm learns new names and its elements. A verification step assures quality using a large background corpus. Further improvement is reached through classifying the newly learnt elements on character level. Moreover, unsupervised rule learning is ...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002